Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics
نویسندگان
چکیده
The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well knownmethod tomeasure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.
منابع مشابه
Class-Based Attribute Weighting for Time Series Classification
In this paper, we present two novel class-based weighting methods for the Euclidean nearest neighbor algorithm and compare them with global weighting methods considering empirical results on a widely accepted time series classification benchmark dataset. Our methods provide higher accuracy than every global weighting in nearly half of the cases and they have better overall performance. We concl...
متن کاملNearest Neighbour Distance Matrix Classification
A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the various measures available (e.g. Euclidean distance, Manhattan distance, Mahalanobis distance or other specific distance measures). In this paper, we propose ...
متن کاملHesitant Fuzzy k-Nearest Neighbour (HFK-NN) Classifier for Document Classification and Numerical Result Analysis
This paper presents new approach Hesitant Fuzzy K-nearest neighbour (HFK-nn) based document classification and numerical results analysis. The proposed classification Hesitant Fuzzy K-nearest neighbour (HFKnn) approach is based on hesitant Fuzzy distance. In this paper we have used hesitant Fuzzy distance calculations for document classification results. The following steps are used for classif...
متن کاملAn Empirical Comparison of Weighting Functions for Multi-label Distance- Weighted K-nearest Neighbour Method
Multi-label classification is an extension of classical multi-class one, where any instance can be associated with several classes simultaneously and thus the classes are no longer mutually exclusive. It was experimentally shown that the distance-weighted k-nearest neighbour (DWkNN) algorithm is superior to the original kNN rule for multi-class learning. But, it has not been investigated whethe...
متن کاملSome improvements on NN based classifiers in metric spaces
The nearest neighbour (NN) and k-nearest neighbour (k-NN) classification rules have been widely used in Pattern Recognition due to its simplicity and good behaviour. Exhaustive nearest neighbour search may become unpractical when facing large training sets, high dimensional data or expensive dissimilarity measures (distances). During the last years a lot of fast NN search algorithms have been d...
متن کامل